25 research outputs found

    Seeing is believing: the importance of visualization in real-world machine learning applications

    Get PDF
    The increasing availability of data sets with a huge amount of information, coded in many diff erent features, justifi es the research on new methods of knowledge extraction: the great challenge is the translation of the raw data into useful information that can be used to improve decisionmaking processes, detect relevant profi les, fi nd out relationships among features, etc. It is undoubtedly true that a picture is worth a thousand words, what makes visualization methods be likely the most appealing and one of the most relevant kinds of knowledge extration methods. At ESANN 2011, the special session "Seeing is believing: The importance of visualization in real-world machine learning applications" reflects some of the main emerging topics in the field. This tutorial prefaces the session, summarizing some of its contributions, while also providing some clues to the current state and the near future of visualization methods within the framework of Machine Learning.Postprint (published version

    Classification, dimensionality reduction, and maximally discriminatory visualization of a multicentre 1H-MRS database of brain tumors

    Get PDF
    The combination of an Artificial Neural Network classifier, a feature selection process, and a novel linear dimensionality reduction technique that provides a data projection for visualization and which preserves completely the class discrimination achieved by the classifier, is applied in this study to the analysis of an international, multi-centre 1H-MRS database of brain tumors. This combination yields results that are both intuitively interpretable and very accurate. The method as a whole remains simple enough as to allow its easy integration in existing medical decision support systems.Peer ReviewedPostprint (published version

    Clustering breast cancer data by consensus of different validity indices

    Get PDF
    Clustering algorithms will, in general, either partition a given data set into a pre-specified number of clusters or will produce a hierarchy of clusters. In this paper we analyse several different clustering techniques and apply them to a particular data set of breast cancer data. When we do not know a priori which is the best number of groups, we use a range of different validity indices to test the quality of clustering results and to determine the best number of clusters. While for the K-means method there is not absolute agreement among the indices as to which is the best number of clusters, for the PAM algorithm all the indices indicate 4 as the best cluster number

    Clustering breast cancer data by consensus of different validity indices

    Get PDF
    Clustering algorithms will, in general, either partition a given data set into a pre-specified number of clusters or will produce a hierarchy of clusters. In this paper we analyse several different clustering techniques and apply them to a particular data set of breast cancer data. When we do not know a priori which is the best number of groups, we use a range of different validity indices to test the quality of clustering results and to determine the best number of clusters. While for the K-means method there is not absolute agreement among the indices as to which is the best number of clusters, for the PAM algorithm all the indices indicate 4 as the best cluster number

    Making nonlinear manifold learning models interpretable: The manifold grand tour

    Get PDF
    Dimensionality reduction is required to produce visualisations of high dimensional data. In this framework, one of the most straightforward approaches to visualising high dimensional data is based on reducing complexity and applying linear projections while tumbling the projection axes in a defined sequence which generates a Grand Tour of the data. We propose using smooth nonlinear topographic maps of the data distribution to guide the Grand Tour, increasing the effectiveness of this approach by prioritising the linear views of the data that are most consistent with global data structure in these maps. A further consequence of this approach is to enable direct visualisation of the topographic map onto projective spaces that discern structure in the data. The experimental results on standard databases reported in this paper, using self-organising maps and generative topographic mapping, illustrate the practical value of the proposed approach. The main novelty of our proposal is the definition of a systematic way to guide the search of data views in the grand tour, selecting and prioritizing some of them, based on nonlinear manifold models

    Coastal zone management in the fisheries sector program

    Get PDF
    International audienceThis paper presents an analysis of censored survival data for breast cancer specific mortality and disease-free survival. There are three stages to the process, namely time-to-event modelling, risk stratification by predicted outcome and model interpretation using rule extraction. Model selection was carried out using the benchmark linear model, Cox regression but risk staging was derived with Cox regression and with Partial Logistic Regression Artificial Neural Networks regularised with Automatic Relevance Determination (PLANN-ARD). This analysis compares the two approaches showing the benefit of using the neural network framework especially for patients at high risk. The neural network model also has results in a smooth model of the hazard without the need for limiting assumptions of proportionality. The model predictions were verified using out-of-sample testing with the mortality model also compared with two other prognostic models called TNG and the NPI rule model. Further verification was carried out by comparing marginal estimates of the predicted and actual cumulative hazards. It was also observed that doctors seem to treat mortality and disease-free models as equivalent, so a further analysis was performed to observe if this was the case. The analysis was extended with automatic rule generation using Orthogonal Search Rule Extraction (OSRE). This methodology translates analytical risk scores into the language of the clinical domain, enabling direct validation of the operation of the Cox or neural network model. This paper extends the existing OSRE methodology to data sets that include continuous-valued variables

    Explainable Inflation Forecasts by Machine Learning Models

    No full text
    Forecasting inflation accurately in a data-rich environment is a challenging task and an active research field which still contains various unanswered methodological questions. One of them is how to find and extract the information with the most predictive power for a variable of interest when there are many highly correlated predictors, as in the inflation forecasting problem. Traditionally, factor models have been used to tackle this problem. However, a few recent studies have revealed that machine learning (ML) models such as random forests may offer some valuable solutions to the problem. This study encourages greater use of ML models with or without factor models by replacing the functional form of the forecast equation in a factor model with ML models or directly employing them with several feature selection techniques. This study adds new tree-based models to the analysis in the light of the recent findings in the literature. Moreover, it proposes the integration of feature selection techniques with Shapley values to find out concise explanations of the inflation predictions. The results obtained by a comprehensive set of experiments in an emerging country, Turkey, facing a high degree of volatility and uncertainty, indicate that tree-based ensemble models can be advantageous by providing better accuracy together with explainable predictions.</p

    Machine learning in human cancer research

    No full text
    Evidence-based medicine has grown in stature over three decades and is now regarded a key development of modern medicine. The evidence base can be heterogeneous, involving both qualitative knowledge and measured quantitative data. Data analysis in the area of cancer research has for long been the playing field of statisticians but, over the last decade, Machine Learning (ML) methods have also begun to establish themselves an an alternative and promising approach to computer-based data analysis in oncology. In this chapter, we provide a state-of-the-art in the main areas of cancer research in which ML methods are currently being applied, and discuss some of the advantages and disadvantages of their application. We also comment on and illustrate the integration of ML methods in clinical oncology decision support systems.Postprint (published version

    Investigating human cancer with computational intelligence techniques

    No full text
    Driven by the growing demand of personalization of medical procedures, data-based, computer-aided cancer research in human patients is advancing at an accelerating pace, providing a broadening landscape of opportunity for Computational Intelligence methods and related techniques. This landscape can be observed from the wide-reaching view of population studies down to the genotype detail. In this introductory chapter, we provide a sweeping glimpse, by no means exhaustive, of the state-of-the-art in this field at the different scales of data measurement and analysis. We do so by focusing mostly on examples from European research, some of which are the matter of the following chapters of the book.Peer ReviewedPostprint (published version
    corecore